DOC  FILE  COPY  AD  A 0 4 1 3 ; 


rs“\ 


UTEC-CSc-77-090 
Semi-Annual  Technical  Report 


NOISE  SUPPRESSION  METHODS  FOR  ROBUST 
SPEECH  PROCESSING 


Contractor:  University  of  Utah 

Effective  Date:  1 October  1976 

Expiration  Date:  30  September  1978 

Reporting  Period:  1 October  19/6  - 31  March  1977 


Principal  Investigator:  Dr.  Steven  F.  Boll 

Telephone:  (801)  581-8224 


Sponsored  by 


Defense  Advanced  Research  Projects  Agency  (DoD) 
ARPA  Order  No.  3301 

Monitored  by  Naval  Rqjjfflrrk  talwflfnry  1 
Under  Contract  No(  N00173-77-C0041  ) I 


D D C 

ErsEDariSfn 

JUL  8 1917  ;j 


The  view!  aud  conclusions  contained  in  this 
document  are  those  of  the  authors  and  should 
not  be  interpreted  as  necessarily  representing 
the  official  policies,  either  expressed  or 
implied,  of  the  Defense  Advanced  Research 
Projects  Agency  or  the  U.S,  Government. 


UNCLASSIFIED 


GF.  CyRItv  A T |<  F This  PAGE  fW'h#n  Pat*  hntrrrtl) 


REPORT  DOCUMENTATION  PAGE 


■hli;clri  number 


\ r L F f mnd  Subtitle) 

/ Noise  Suppression  Methods  for  Robust 
Speech  Processing  * X 


3Arp  KF.AD  INSTRUCTIONS 

BEFORE  COMPLETING  K)RM 

1 GOVT  ACCESSION  NO'  3 RECIPIENT'S  CATALOG  Nl'MBER 


irmi'jjj.ijuj.t  in  n 


— A Semi-Annual  Technical 
/I  1 dct*-«76-  31  Mar®  W77* 


)/  -Dri  Steven  F.  /boI 


R PERFORMING  ORGANIZATION  name  and  address 

University  of  Utah 
Computer  Science  Department 
Salt  Lake  City,  Utah  84112 


1U.  PHOGRAM  element,  project,  task 
AREA  A WORK  UNIT  NUMBERS 


Project:  76-RPA-3301 


I*  MONITORING  AGENCY  NAME  » AODRESSfU  dtttftMQf'Oop  Contiolllnj  Olllct)  IS  SECURITY  CLASS,  (ot  Ihlt  rtporl) 

Naval  Research  Laboratory  Xy  Yyj 

4555  Overlook  Avenue,  S.  W . I ' x j j- > \ Unclassified 

Mail  Code  2415-A.M.  / 


1S«  declassification  downgrading 
schedule 


16  DISTRIBUTION  STATEMENT  (ol  Ihlt  Rtpott) 

This  document  has  been  approved  for  public  release  and  sale; 
its  distribution  is  unlimited. 


17.  DISTRIBUTION  STATEMENT  lot  Ih*  tbttimcl  tnltitd  In  Sloe*  20.  II  dllltrtnl  from  Rtport)  | *Y\  f“*'\  y 


l«  supplementary  notes 


Iff!  8 M77 


J LJ  C. 


; 'I, 


19  KEY  WORDS  (Continua  on  ravataa  atda  ft  nacaaaary  and  Idantlty  by  block  nurnbat) 


Digital  noise  suppression;  Linear  Predictive  Coding;  Narrow  band  coded 
speech;  Adaptive  noise  cancellation;  Wiener  filtering;  Power  spectrum; 
Autocorrelation. 


30  ABSTRACT  fConffnu#  on  ravaraa  atda  If  nacaaaary  and  Idantlty  by  block  mimbar) 


Robust  speech  processing  in  practical  operating  environments  requires 
effective  environmental  and  processor  noise  suppression.  This  report  des- 
cribes the  technical  findings  and  accomplishments  during  this  reporting 
period  for  the  research  program  funded  to  develop  real  time,  compressed 
speech  analysis-synthesis  algorithms  whose  performance  is  invariant  under 
signal  contamination.  Fulfillment  of  this  requirement  is  necessary  to 
insure  reliable  secure  compressed  speech  transmission  within  realistic 


DD  ,^m7J  1473  edition  OF  I NOV  SB  IS  OBSOLETE 

S/N  0102  LF  014  8601 


UNCLASSIFIED 

SECURITY  CLASSIFICATION  OF  THIS  RAGE  Yirh»n  D»(»  *n/» 


SECURITY  CLASSIFICATION  OF  THIS  PAGEf*  D»fA  Entarad) 


k 


20.  ABSTRACT  con't. 

military  command  and  control  environments.  Overall  contributions  resulting 
from  this  research  program  include  the  understanding  of  how  environmental 
noise  degrades  narrow  band,  coded  speech,  development  of  appropriate  real 
time  noise  suppression  algorithms,  and  development  of  speech  parameter 
identification  methods  that  consider  signal  contamination  as  a fundamental 
element  in  the  estimation  process.  Through  the  appropriate  integration  of 
developed  noise  suppression-parameter  identification  algorithms,  specifi- 
cations for  robust  speech  processing  algorithms  will  be  provided. 

\ 

\ 


f * *{ 


».  v**V,  yf 

i, tPM 

r ’ 


SECURITY  CLASSIFICATION  OF  THIS  PAGEflWiAn  Data  Entaradi 


TABLE  OF  CONTENTS 


Page 

I.  DD  FORM  1473 

II.  REPORT  SUMMARY 

Section  I.  Summary  of  Program  for  Reporting  1 

Period 

III.  RESEARCH  ACTIVITIES 

Section  I.  Summary  of  Overall  Research  Program  6 

Section  II.  Generation  and  Calibration  of  Data  11 

Base 

Section  III.  Characterization  of  the  Performance  20 
of  Current  LPC  Speech  Analysis 
Methods  Applied  to  Noisy  Speech 

Section  IV.  An  Integrated  Noise  Suppression-  48 

Speech  Analysis  Algorithm: 

Predictive  Noise  Cancellation 

Section  V.  A Preprocessing  Noise  Cancellation  54 
Algorithm: 

Dual  Input  Noise  Suppression 


IV.  LIST  OF  FIGURES 


75 


SECTION  I 


Summary  of  Program  for  Reporting  Period 


A . Int  roduction 

1.  This  section  summarizes  the  objectives,  tasks  and 
results  of  the  research  program  for  the  period  i 
Oct.  1976  through  31  March  1977.  Detailed 

descriptions  are  provided  in  the  remaining  sections. 

B.  Objectives 

1.  Accumulate,  digitize  and  categorize  a representative 
data  base  consisting  of  clean  speech,  noise  and 
noisy  speech  needed  for  measuring  speech  analysis 
algorithm  performance  and  noise  suppression 

algorithm  effectiveness, 

2'  1 nvestigate  the  time  and  frequency  relationships 
between  speech  and  additive  noise  as  well  as  the 
corresponding  analysis  parameter  variations 
resulting  from  the  analysis  of  the  noisy  speech. 

3.  Develop  integrated  noise  suppression  speech  analysis 
algorithms  which  improve  the  quality  and 
intelligibility  of  coded  speech  by  modifying  the 
analysis  equations  to  explicitly  account  for  and 
thereby  suppress  the  noise. 

^ Devel°P  Preprocessing  noise  suppression  algorithms 
using  two  microphone  inputs  which  will  improve  the 
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signal- to- noise  ratio  prior  to  vocoder  input. 

Tasks  Undertaken  and  Results 

' • A data  ba3e  was  digitized  from  recordings  of 
laboratory  noise;  speech  and  noise  spoken  in  a quiet 
room,  office  and  helicopter  environments;  and  audio 
test  sentences  used  in  NSA  consortium  testing.  In 

addition,  utility  programs  for  measuring  and  scaling 
data  were  developed. 

2.  Interactive  display  and  playback  programs  were 
developed  for  comparing  speech  spectra, 

correlations,  and  analysis  parameters  for  various 
signal-to-noise  ratios  and  environments. 

a.  Our  research  determined  that  the  spectral  and 
temporal  distortions  resulting  from  LPC 
analysis  of  noisy  speech  include: 

(1)  Widened  formant  bandwidths. 

(?)  Shifted  formant  center  frequencies, 

(1)  Low  energy  formants  partially  or 

completely  obscured  by  noise  floor, 

(4)  Overall  decrease  in  spectral  dynamic 
range . 

(5)  Increase  in  peak  factor  of  voiced 

synthetic  speech  with  the  accompanied 
increase  in  annoying  "buzzy"  quality. 

b.  In  addition  the  research  determined  that  the 
short  time  crosscorrelations  between  speech 
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and  broadband  Gaussian  noise  do  not  average 
to  zero.  Thus  it  is  incorrect  to  assume  that 
speech  and  Gaussian  white  noise  are 
uncorrelated  during  short  time  analysis 
intervals. 

c.  Using  the  LPC  parameter  comparison  program  it 
was  demonstrated  that  as  the  signal -to-no  ise 
ratio  decreases  the  spectral  distortion  as 
measured  by  the  Gray  and  Markel  distance 
measure  increases  as  follows: 
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An  expanded  speech  analysis  method  was  developed 
called  "Predictive  Noise  Cancellation"  to  suppress 
noise  by  modifying  the  speech  autocorrelations  prior 
to  LPC  coefficient  calculation.  Estimates  of 
current  noise  values  were  adaptively  predicted  from 

long  term  noise  statistics  taken  during  non-speech 
intervals. 

a.  Predictive  Noise  Cancellation  offers  the 
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advantages  of: 


(1)  Uses  procedures  which  are  currently 
available  in  real  time  LPC  vocoders. 

(2)  The  method  results  in  a stable  synthesis 
f ilter . 

(3)  Background  noise  is  reduced  by  10  to  ?0 
dB. 

However  it  has  two  major  disadvantages: 

(4)  The  method  is  dependent  upon  the  phase  of 
the  signals  processed. 


The  estimate 

of 

the 

noise-signal 

correlation 

filter 

is 

corrupted  when 

speech  is  present, 

A two  microphone  input  noise  cancellation  algorithm 
which  has  been  used  effectively  in  the  areas  of 
antenna  side-lobe  attenuation  and  data  channel 
eauilization  was  implemented  and  calibrated  to 
determine  its  effectiveness  in  reducing  noise  prior 
to  vocoder  input.  From  one  microphone  is  recorded 
speech  plus  noise  and  from  the  other,  a correlated 
noise  signa’. 

a.  Preliminary  results  demonstrated  that  the 
method  will  remove  broadband  noise  which  has 
been  digitally  added  to  speech,  by  subtracting 
the  second  adaptively  filtered  noise  channel 
from  the  noisy  speech. 


b.  Signal-to-noise  improvements  up  to  4 0 d B were 
measured. 

E.  Future  Efforts 

1.  Based  upon  the  success  of  the  dual  input  adaptive 
noise  cancellation  algorithm  for  removing  digitally 
added  laboratory  noise,  the  method  will  be  applied 
to  the  removal  of  noise  found  in  office  and 
helicopter  environments. 

2.  The  inadequacies  of  the  Predictive  Noise 

Cancellation  method  can  be  removed  by  using  a 
frequency  domain  spectral  averaging  technique. 
Although  this  technique  requires  Fourier  transforms, 
it  appears  to  be  implementabl e in  real  time, 

applicable  to  other  vocoder  forms  such  as  channel  or 

homomorphic,  and  have  better  noise  cancellation 
properties . 

■i.  It  can  be  shown  that  an  all-pole  process  corrupted 
by  additive  Gaussian  noise  can  be  modeled  as  a 
pole-?. ero  process.  An  investigation  is  now  under 
way  to  determine  whether  algorithms  for  estimating 
pole-zero  processes  can  be  adapted  to  find  the 
predictor  coefficients  corresponding  to  the 
underlying  clean  speech. 
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III.  RESEARCH  ACTIVITIES 


SECTION  I 


Summary  of  Overall  Research  Program 


Program  Objectives 


Primary  Objective 

To  develop  robust  speech  processes,  based  upon  the 
integration  of  digital  noise  suppression  methods  and  narrow 
band  speech  anal y s i s- sy n t he 3 i s methods,  capable  of  realizing 
practical,  real  time  methods  for  effectively  processing 
speech  recorded  ir.  practical  operating  environments. 

Support  Objectives 

To  specify  noise  suppression  methods  for  robust  speech 
processing  will  require  the  following  tasks: 

1.  Accumulation  and  categorization  of  signal 

contamination  associated  with  practical  operating 
environments. 

2.  Categorization  of  currently  used  speech  processing 
algorithm  performance,  e.g.  Linear  Predictive 
Coding,  (LPC)  in  these  operating  environments. 
Development  of  real  time  noise  suppression 
algorithms  and  categorization  of  their 
effectiveness  in  reducing  signal  contamination. 
Development  of  new  or  modified  speech  analysis 
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5 . 


6 . 


algorithms  which  can  effectively  extract  acoustic 
parameters  from  contaminated  speech. 

Specification  for  robust  algorithms  through  the 

integration  of  noise  suppression-parameter 
identification  algorithms. 

Documentation  and  demonstration  of  robust 
algorithm  performance  using  contaminated  speech. 


Research  Plan 

The  research  program  consists  of  three  parallel  but 
interactive  subprograms.  These  programs  are  described  as 
(1)  Operating  Environment  Understanding;  (2)  Noise 
Suppression  Algorithm  Development;  (3)  Speech  Processing 
Algorithm  Development.  The  study  is  applied  to  contaminated 
signals  generated  both  in  the  laboratory  as  well  ? i actual 
operating  environment.  In  addition,  examples  of  the 
contaminated  signals  have  been  provided  for  the  program  by 
the  National  Security  Agency  { NSA ) . Using  this  data  bsse, 
the  program  addresses  signal  contaminations  associated  with 
realistic  military  environments. 

Research  Approach 

The  program  is  broken  down  into  four  phases.  Within 
each  phase  the  parallel  tasks  of  environment  understanding, 
noise  suppression  algorithm  development,  and  speech 
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processing  algorithm  development  are  carried  out. 

Initially  tue  characterization  of  the  environments  and 
how  they  effect  the  speech  analysis  methods  must  he 
understood.  This  characterization  is  done  in  Phase  I of  the 
program.  Next,  it  must  be  determined  how  the  current 
AKPA-NSC  speech  processing  algorithms  perform  in  the 
environments.  In  Phase  II  the  quantitative  performance  of 
the  algorithms  will  be  measured  using  both  contaminated  and 
unccntamir.ated  speech.  These  measures  are  obtained  by 
comparing  the  output  analysis  ’ o u s t A.  c parameters  (such  as 
pitch,  voicing,  gain,  etc.)  estimated  usinc  both 
contaminated  and  non-cont aminat ed  signals  as  well  as 
spectral  deviations.  Examples  of  these  comparisons  are 
presented  in  this  report. 

After  having  characterized  the  signal  contamination,  as 
well  as  the  algorithm's  response  to  the  contamination, 
decisions  will  be  made  as  to  how  to  effectively  and 
efficiently  suppres  or  eliminate  the  noise  using  algorithms 
either  already  implemented  or  currently  being  developed. 
Thus  in  parallel  with  the  above  tasks,  will  be  the 
development  of  noise  suppression  algorithms. 

In  Phase  III  the  choice  of  which  algorithm  to  use  based 
upon  which  type  of  contamination  is  present  will  be  made. 
After  the  appropriate  integration  of  noise  suppression  and 
speech  parameter  identification  algorithms,  the  resulting 
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system's  performance  to  undistorted  and  dir  ted  speech 
will  be  categorized  and  demonstrated. 


In  .Jhase  IV  the  implementation  requirements  needed  to 
interface  the  resulting  robust  algorithm?  to  the  ARPA 
network  speech  communication  system  will  be  determined  and 
specified.  Below  is  a summary  of  the  task  orderings. 


Summary  of  Tasks 


Phas’  I 


Noise  Characterization  and 
Implementation 


Processor 


1.  Accumulate  and  categorize  signal  contamination 
data  base. 

2.  Initiate  theoretical  investigation  of  parameter 

estimation  techniques  based  on  degraded  input 
speech  . 

3.  Develop  appropriate  utility  programs  needed  to 
manipulate  data  and  display  essential  features. 

Phase  II  Measurement  of  Algorithm  Performance  Using 
Contaminated  Speech 

1.  Determine  the  performance  of  present,  unmodified 

speech  compression  algorithms  using  contaminated 
speech  and  categorize  results. 

2.  Determine  the  performance  of  noise  suppression 
algorithms  to  undistorted  and  distorted  speech  and 
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categorize  results. 


Phase 


l 

F 


Phase 


3.  Continue  theoretical  investigations  of  parameter 
estimation  techniques  based  upon  degraded  speech 
for  suppressing  the  known  noise  environments 
categorized  in  Phase  I in  order  to  compensate  for 
present  vocoder  limitations. 

III  Specifications  for  Robust  Speech  Processing 
Algor ithms 

1.  For  each  operating  environment,  determine  the 
appropriate  integration  of  noise 

suppression-parameter  identification  algorithm. 

?.  Categorize  the  resulting  robust  system's 

performance  to  undistorted  and  distorted  speech. 

3.  Demonstrate  and  document  robust  system  improvement 
and  performance  for  the  different  operating 
environments  . 

IV  System  Implementation  and  Protocol  Specifications 

1.  Determine  and  specify  implementation  requirements 
needed  to  interface  robust  algorithms  to  ARPA 
network  speech  communication  system. 
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SECTION  II 


Generation  and  Calibration  of  Bata  Base 


Object  ives 

1.  Accumulate,  digitize  and  categorize  a representative 
data  base  for  measuring  speech  analysis  algorithm 
performance  and  noise  suppression  algorithm 
effectiveness  . 

Approach 

1.  Digitize  laboratory  generated  noise  files  which 
model  components  found  in  actual  operating 
environment  s . 

a.  Wide  band  uncorrelated  Gaussian  noise 

b.  Wide  band  correlated  periodic  noise 

2 . Digitize  speech  and  noise  recorded  in  bofh  quiet, 
ideal  and  noisy  actual  operating  environments. 

a.  "Clean"  speech  having  negligible  addi'ive 
noise  component. 

b.  Speech  recorded  live  in  helicopter  cockpit. 

c.  Speech  recorded  live  in  normal  office 

environment . 

3.  Obtain  and  digitize  speech  used  by  National  Security 
Agency  as  part  of  consortium  test  of  narrow  band 


devices. 


Calibration  of  laboratory  noise. 

a.  Measure  average  energy  of  noise  and  clean 
speech  files. 

b.  Specify  desired  s igna 1 - t o - n o i se  ratio,  SNR. 

c.  Scale  noise  files  by  appropriate  gain  and  add 
to  clean  speech  files. 

d.  Generate  contaminated  speech  files  having  SNR 
ranging  from  -10dB  to  40dB. 

Specify  and  categorize  type  of  signal  contamination 

a.  Laboratory  noise  digitally  added  to  clean 
speech . 

b.  Field  noise  digitally  added  to  clean  speech. 

c.  Field  noise  acoustically  added  to  speech  (true 
field  conditions). 

3 .;s 

Generation  of  laboratory  noise. 

a.  The  output  from  an  analog  noise  generation  was 
digitized  and  recorded  at,  sampling  rates  of 
6.67KHZ,  8.0KHZ  and  10.0KHZ. 

b.  1 he  output  of  a square  wave  generator  was 
digitized  and  recorded  at  rates  of  6.67KHZ, 
8.0KHZ  and  10.0KHZ. 

(1)  fundamental  frequency  was  both  fixed  at 
400  HZ  and  varied  linearly  from  400  HZ  to 
approximately  1000  HZ. 

Generation  of  field  data. 


a.  Using  a portable  stereo  cassette  recorder  and 

a Sony  directional  microphone  live  stereo 
recordings  were  made  in  following 

environments: 

(1)  Sensory  Information  Processing-group 
"Quiet  Room" 

a.  Ambient  noise  level  = 27dB 

b.  Speech  recorded  in  this  environment  was 
used  as  noise-free  "clean"  text. 

b.  Computer  Science  Department  Office 

(1)  Ambient  noise  level  = 65dB 

(2)  This  speech  data  represented  the  office 
environment  . 

c.  Ramjet  Helicopter  Cockpit 

(1)  Ambient  noise  level  = 105dB. 

(2)  This  speech  data  represented  the 

helicopter  environment . 

Recordings  of  National  Security  Agency's  consortium 
audio  test  tapes, 

a.  Three  speakers  in  three  environments  were 
recorded . 

(1)  Environments:  Quiet,  Office,  Helicopter. 

b.  Helicopter  noise  without  speech  was  also 

recorded . 

c.  Data  was  filtered  at  3.2KHZ  and  sampled  at 
6 . 67KHZ . 
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Development,  of  utility  programs  neeaed  to  measure 
signal  energy  and  adjus*  signa 1 -t o -noi se  ratios, 
a.  Program  for  measuring  signal  energy: 


BWEGHT 


(1)  Program  name:  BWEGHT 

(2)  Program  authors:  W.  Done,  D.  Pulsipher 

and  J . Youngberg 

(3)  Program  description. 


In  analysing  the  effects  of  noise  on  the  various 
systems  being  tested,  a measure  is  needed  for  the 

signal-to-noise  ratio  (SNR)  that  will  quantify  the 
degradation  of  various  noise  levels.  The  measure  should 
also  match  the  degradation  the  listener  intuitively  believes 
occur  at  a given  noise  level.  Because  the  final 

monitor  of  the  systems  being  tested  is  the  human  ear,  t he 
measure  selected  is  based  on  the  B-weighting  curve  used  for 
calibration  of  audio  equipment.  The  B-curve  is  a member  of 
a family  of  curves  which,  for  specific  ranges  of  energy, 
indicate  sound  energy  levels  throughout  the  auditory  range 
which  will  be  perceived  as  constant  loudness  levels.  The 
approximation  used  for  the  B-curve  is  given  by: 


B(f)  = — j 7160,^  fg 

f + 4l  90256  x loV  * 1TF5T4TT 1 011 


in  - 


A plot  of  B ( f ) , which  represents  a power  spectra 
weighting  function,  is  shown  in  Figure  II. 2.  A program 
called  BWEGHT  has  been  written  which  performs  the  following: 

1)  Inputs  a frame  of  speech  (or  noise)  of  width  NW; 

2)  Windows  that  frame  with  a Hamming  window  (optional); 

3)  Calculate  the  DFT  (of  order  NU)  of  the  frame; 

M ) Finds  the  magnitude  squared,  S(f); 

5)  Multiply  S ( f ) by  t.ie  weighting  function,  B(f); 

6)  Calculate  the  energy  in  that  frame  of  data,  by 


E 


f 


? * ' 

• i P(V- 

j 3 0 


LLUL 

N 


s 


fs  = sampling  frequency 
P(f)  - S ( f ) B ( f ) , 


7)  Proceed  to  the  next  frame  of  speech  (or  noise). 

®)  Calculate  the  average  energy  per  frame  by  averaging 
the  E^.  found  for  each  frame. 

BWEGHT,  then  furnishes  an  average  energy/frame  for  that 
passage.  Typical  parameter  values  are: 

NW  = 2048  (frame  size) 

NU  = 13  (order  of  DFT) 


15 


b.  Program  for  adding  known  amounts  of  speech  and 

noise  : 

(1)  Program  name:  SPLUSN 

(2)  Program  authors:  W.Done,  D.  Pulsipher. 

(3)  Program  description: 

SPLUSN 

Testing  of  noise  suppression  systems  requires  the  use 
of  data  contaminated  by  known  amounts  of  noise.  For  this 
reason,  a program  was  designed  which  would  scale  a given 
noise  file  to  achieve  a certain  signal-to-noise  ratio  when 
added  to  a speech  file.  This  program,  SPLUSN,  produces  an 
output  file,  x(k),  according  to  either  of  the  following 
equations  (depending  on  whether  the  scaled  noise  or  signal 
plus  scaled  noise  is  desired): 

x(k)  = c • n ( k ) 

x(k)  = s ( k ) + c • n ( k ) 


where  c is  a constant,. 

The  constant  c can  be  entered  iri  one  step  as  s single 
number,  or  as  function  of  Es  and  , the  energy  contained  in 
the  sequences  s(k)  and  n(k),  respectively,  and  S,  the 
desired  signal-to-noise  ratio.  The  constant  c is  related  to 
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these  parameters  by: 


where 


C « 


'/ 


2 


S = 10 


V10 


and  Sd  is  the  signal-to-noise  ratio  in  dB.  E and  E are 

s n 

obtained  from  s(k)  and  n(k),  respectively,  by  using  the 
program  BWEGHT  described  previously. 

0 . Results 

1.  A data  base  has  now  been  recorded  and  digitized 
containing  three  types  of  data. 

a.  Laboratory  noise  digitally  added  t. o speech. 

(1)  Speech  plus  wide-band  Gaussian  noise  with 
SNR  under  program  control. 

(2)  Speech  plus  periodic  noise  from  a square 

wave  generator  with  the  SNR  under  program 
control. 

b.  Field  noise  digitally  added  to  speech. 

(1)  Speech  plus  noise  from  office  or 
helicopter  environment  with  SNR  under 
program  control. 
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c.  Field  noise  acoustically  added  to  speech 

(1)  Live  recordings  made  with  portable 
cassette  recorder. 

(2)  NSA  recordings. 

2.  Utility  programs  have  been  developed  for  measuring 

signal  energy  and  controlling  signal-to-noise  ratio, 
SNR. 

E . Future  plans 

1.  To  receive  and  digitize  any  additional  audio  tapes 
provided  by  NRL  needed  to  evaluate  noise  suppression 
algor  it  hms  . 
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SECTION  III 


Characterization  of  the  Performance  of  Current  LPC 
Speech  Analysis  Methods  Applied  to  Noisy  Speech 

Introduction 

This  section  considers  two  alternative  but 

complementary  methods  for  characterizing  the  effect  of  noise 
on  the  LPC  analysis  of  speech.  First,  time  and  frequency 

relationships  between  the  speech,  noise  and  noisy  speech  are 
computed  and  compared.  Second,  LPC  parameter  variations  due 
to  additive  noise  are  investigated. 

Time -Frequency  Relations 


A . Object ives 

1.  Investigate  modifications  to  the  short  time  speech 
spectrum  and  corresponding  all-pole  spectrum  caused 
by  add  it ive  noise  . 

2.  Determine  to  what  extent  the  speech  and  noise  are 
correlated  during  vocoder  analysis  time  segments. 

3.  Examine  the  temporal  modifications  to  the  synthetic 
speech  resulting  from  the  analysis  of  noisy  speech. 

B.  Approach 

1.  The  intelligibility  and  quality  of  vocoder  speech 
depends  directly  on  how  closly  the  all-pole  spectrum 
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matches  the  noise  free  spectrum.  To  determine  now 
noise  modifies  or  distorts  the  spectral  bit  from  the 
noise  free  case,  a standard  LPC  analysis  was  applipd 
to  both  speech  and  speech  plus  wide  band  Gaussian 
noise  at  specified  signal-to-noise  ratios. 

2.  Classical  methods  for  suppressing  noise  using  linear 

filtering  usually  make  the  assumption  that  the 
desired  signal  is  uncorrelated  with  additive 
broadband  Gaussian  noise.  This  simplifies  the 
analysis,  since  now  crosscorrelations  between  speech 
and  noise  are  set  to  zero.  To  determine  whether 
this  assumption  remains  valid  over  the  short 

analysis  periods  encountered  in  speech  processing 
the  short  time  crosscorrelations  and 

autocorrelations  between  speech  and  broadband 

Gaussian  noise  are  computed  and  compared.  If  the 
crosscorrelations  are  not  negligible  then  they  must 
be  accounted  for  when  analyzing  noisy  speech. 

3.  Basic  to  any  investigation  in  speech  analysis  is 

ability  to  interactively  display  and  listen  to  the 

synthetic  speech  derived  from  the  analysis. 

Therefore,  synthetic  speech  was  generated  using 
clean  and  noisy  speech  at  specified  s igna 1-t o-no ise 
levels.  Critical  headphone  listening  tests  were 
then  made  to  judge  subjective  changes  in  quality  anc 
intelligibility  due  to  the  addition  of  noise. 
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C.  Tasks 

1.  Development  of  a General  Purpose  Waveform  Display 
Program 

a.  Program  Name:  DSPLAY 

b.  Program  Author:  Dennis  Pulsipher 

c.  Motivation:  In  order  to  evaluate  the 

performance  of  LPC  analysis,  it  is  essential 
that  time  waveforms  and  their  spectra  be 
available  for  interactive  display  and 

playback.  A flexible,  interactive  graphics 
and  audio  playback  program  was  developed  to 
accomplish  this  task. 

d.  Program  Features:  The  following  is  a copy  of 
the  options  available  for  displaying  data. 

. RUN  DSPLA Y [ 21  ,21  ] 

DISPLAY  rROGRAM 

EQUAL  SIZE  BUFFERS  FOR  TRACKS  1 AND  2 
MAXIMUM  LOG  LENGTH:  8192 
NOVEMBER  4,  1976  VER.  43 

PLOT  : >? 

TIME  WAVEFORM 

MAGNITUDE  OF  FREQUENCY  RESPONSE 
LOG-MAGNITUDE  OF  FREQUENCY  RESPONSE  SQUARED 
0 (PHASE  OF  FREQUENCY  RESPONSE) 

NOTHING --SET  STEREO  TRACK 
EXPANDED  VIEW 

OUTPUT  TO  DECTAPE  OF  TIME  WAVEFORM 
BRAND  NEW  TIME  WAVEFORM  FROM  DECTAPE 
CHANGED  TIME  WAVEFORM  LENGTH 
UNFORMATTED  TIME  WAVEFORM  FROM  DISK 
SUCCEEDING  SEGMENT  FROM  PILE 
FOLLOWING  SEGMENT 
WINDOWED  TIME  WAVEFORM 
MODIFIED  HAMMING  WINDOWED  RESPONSE 
TITLE  GRAPH 
PLAY  DISPLAY 


+ 
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ADVANCE  (OH  BACKSPACE)  RECORDS 

DESCRIBE  POSITION 

H SUM  OF  SQUARES 

STEREO  SWITCH 

SAMPLING  FREQUENCY 

B-WEIGHTING 

MISCELLANY 

GRAY  & MARKEL  DISTANCE  MEASURES 
SPECTRAL  ESTIMATE  (LPC) 

MISCELLANY  >>? 

LABELS 
NO  LABELS 
DUAL  GRID 
GRID 
LOG 

LINEAR 

MULTIPLE  PLOT 
ONE  PLOT 

IMPUlSE/FREQUENCY 
TRACK  1 /TRACK2 

Y SCALE 

Y OFF 

X SCALE 
XOFF 

Y LOG 
VEIN 

ALL  LAEELS 

PRIMARY  LABELS 

BIAS 

0 BIAS 

SET  H 

CLEAR  # 

PLOTS 

BLANKS  (NO  PLOTS) 

RETURN 

/ 

RETURN 

e.  Display  examples  generated  by  this  program  are 
presented  in  the  section  on  results. 

2.  Development  of  Spectral  and  Correlation  Display 
Program  Needed  to  Compare  LPC  Analysis 

a.  Program  name:  CMPARE 

b.  Program  author:  William  Done 

c.  Program  Features:  The  following  is  a 
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description  of  program  CMPARE : 


Calculation  of  linear  prediction  coefficients  Tor  speech  in 
the  presence  of  noise  requires  the  development  of  software 
for  simplified  analysis  of  new  noise  cancellation 
procedures.  The  software  should  also  provide  linear 
prediction  coefficients  based  on  contaminated  and 
uncontaminated  speech  as  a standard  of  comparison  for  the 
algorithm  being  evaluated.  A graphics  system,  based  on  a 
linear  prediction  vocoder,  was  developed  to  perform  the 
following  tasks: 

(1)  Calculate  Mode  1 coefficients  a ^ ( i ) from 
s(k),  the  uncontaminated  speech; 

(2)  Calculate  Mode  2 coefficients  a^d  ) from 

x(k)  = s(k)  ♦ n(k),  the  contaminated 
speech  ; 

(3)  Calculate  Mode  3 coefficients  a _ ( i ) from 

A 

S(k)  using  the  algorithm  being  tested. 
Thus,  the  a i ( i ) represent  [.PC  coefficients  obtained  from 
high  quality  speech,  while  the  ( i ) are  coefficients  from 
noisy  speech,  and  represent  the  quality  possible  in  LPC  if 
no  noise  removal  is  done. 

The  Mode  3 coefficients  a^i)  are  determined  by  the 
noise  suppression  algorithm  being  evaluated.  This  mode  can 
be  changed  by  changing  one  subroutine  in  the  software. 
Associated  with  each  Mode  3 system  is  a graphics  routine 
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which  allows  important  sequences  of  that  algorithm  to  be 
displayed.  The  computer  sense  switches  are  used  to  select 
and  load  various  arrays  into  the  graphics  software  for 
plotting  and  determination  of  spectra.  An  example  of  a 

typical  menu  of  arrays  available  for  loading  is  listed 
below. 

MENU  FOR  LOADING 

^ PITCH  PROFILE 

5 MODE  1 LOSS  FUNCTION 

6 MODE  2 LOSS  FUNCTION 

7 MODE  3 LOSS  FUNCTION 

8 MODE  4 LOSS  FUNCTION 

9 A 1 PREDICTORS 

10  A 2 PREDICTORS 

11  A3  PREDICTORS 

12  S(K  ) 

13  X ( K ; 

14  N(K J 

15  RSS(K) 

16  RXX(K) 

17  RNN(K) 

18  RXS(K) 

19  RSX(K) 

20  RNS(K) 

21  RXN(K) 

23  RNX(K) 

24  DISTANCE  MEASURE 

25  DUMMY 

26  RSSHAT(I):  UNCORR. 

27  RSSCOR(K):  CORR. 

28  ACOR(K):  CORR.  PRED 

# COMMANDS  ARE 
CLEAR  ARRAY 
UNWINDOWED  DFT 
WINDOWED  DFT 
LOAD  DATA  ARRAYS 
DISPLAY 

AUTO-  & CROSS-CORRELATIONS 
FLAG  FOR  HALT 
TRANSFORM  TYPE 
MENU  FOR  LOADING 
ALL-POLE  CROSS  SPECTRA 
QUIT  PROGRAM 


- Zero  plotting  buffers 

- Compute  DFT  of  a sequence 

- Window  sequence,  compute  DFT 

- Load  plotting  buffp^s 

- Enter  display  routine 

- Compute  those  for  s(k),  x(k),  n(k) 

- Set  flag  to  stop  at  a specific  frame 

- Transforms  to  be  magnitude  or  log  magnitude 

- List  menu  above 

- Compute  all -pole  spectra 

- Exit  graphics  routine 
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Operations  available  in  the  graphics 
routine  are  also  listed  above  with  an 
explanation  of  their  function.  Below  are 
listed  the  commands  for  the  display  programs. 


DISPLAY 

NPLOTS  = 3 

>? 

GRID  TYPE 

COMPLEXITY  OF  GRID 
SINGLE  PLOT 
MULTIPLE  PLOTS 
DISPLAY  SIZE 
INTENSITY 
XMIN 

ABCISSA  VALUES 
Y VALUES 
LIMITS  ON  Y 
PLOT 

HORIZONTAL  LABEL 
VERTICAL  LABEL 
2 PLOTS 
RETURN 


2 PLOTS  >>? 

Y VALUES: 

ABSCISSA  VALUES: 

XMIN  : 

DISPLAY  SIZE: 

COMPLEXITY  : 

LIMITS  ON  Y: 

PLOT: 

MULTIPLE  PLOTS: 

SINGLE  PLOTS: 

LABEL 

SWAP 

RETURN 

CLEAR 

The  system  as  described  above  is  versatile  ir 
allowing  the  researcher  to  bring  a new 
algorithm  into  operation  quickly,  with  the 
facility  of  being  able  to  generate  spectra  of 
processed  sequences. 
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3.  Comparison  of  the  LPC  Spectral  Analysis  of  Clean  and 


Noisy  Speech 

a.  Spectral  comparisons:  Using  the  utility 

programs  described  above,  wide  band  Gaussian 
noise  was  scaled  and  added  to  a clean  speech 
file  resulting  in  an  average  s igna 1- t o -no i 3e 
ratio  of  OdB.  Using  this  data  DFT  spectra  and 
all-pole  spectra  of  the  speech,  noise  and 
noisy  speech  were  computed  and  made  available 

for  display.  Examples  of  these  spectral 

comparisons  are  given  in  the  next  section. 

U . Synthesis  Speech  Comparisons:  Using  the  OdB  data 

base,  LPC  synthesis  speech  was  generated,  displayed 
and  recorded.  To  eliminate  differences  in  quality 
or  intelligibility  due  to  pitch  and  voicing 

differences  between  clean  and  noisy  speech,  the 

excitation  parameters  were  computed  using  the  clean 
speech  file.  This  required  modifying  the  LPC 

vocoder  program  to  now  accept  two  data  files 
simultaneously:  clean  speech  and  noise.  Pitch  and 
voicing  decisions  were  made  from  the  clean  text,  LPC 
parameters  and  gain  were  computed  from  the  sum  of 
speech  and  noise.  Examples  of  the  synthesis 

differences  are  given  in  the  xt  section. 

5.  Crosscorrelation  Comparisons:  Using  the  OdB  data 

base,  the  autocorrelations  and  crosscorrelations 
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requ ired 

for  LPC 

analysis 

of 

noisy 

speech 

were 

computed 

and 

made 

available 

for 

display. 

Of 

specific 

interest 

in 

this 

investigation  was 

t hp 

determination  of  how  correlated  the  speech  and  noise 


waveforms  are  within  the  given  short  time  analysis 
frame  used  by  the  vocoder  analyzer.  The 
correlations  considered  are: 


Model  x(m)  = S(m)  + n(m) 

Clean  speech  N-l 

Correlations:  R (k)  = l S(m)S(m+k) 

ss  m=0 


Noisy  speech  N-l 

Correlations:  R ( k ) = l x(m)x(m+k) 

xx  itfO 


Speech-Noise  N-l 

Crosscorrelations:  R (k)  = l S(m)n(m+k) 

sn  m=0 


Assuming  the  additive  noise  model,  the  clean  speech 
autocorrelations  needed  to  solve  for  predictor 
coefficients  are  given  by 

Rss(ra)  = Rxx(m)  - R xp  ( ra ) - Rnx(m)  + Rnn(m) 
or  in  terms  of  the  crosscorre 1 at  tons  between  the 
speech  s(m)  and  noise  n(m): 

Rss(m)  = Rxx(m)  - Rsn(m)  - Rns(m)  - Rnn(m) 
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Classical  linear  filtering  analysis  methods  such  as 
Wiener  filtering  and  Kalman  filtering  assume  that 
signal  and  noise  are  uncorrelated  and  therefore, 
that  Rsn(m)  and  Rns(m)  will  average  to  zero. 
Although  this  may  be  true  for  long  time  averages, 
the  nonstat  lonarity  of  the  speech  prohibits 


averaging  longer  than  20  or 

30 

milliseconds. 

For 

this  short 

time  interval 

the 

cross-terms  must  be 

examined  and 

included  in 

the 

analysis  if 

their 

values  are 

substant  ial  . 

The 

next  section 

gives 

representative  examples  of  the  cross-terms. 

Results 

1.  Introduction:  This  section  presents  a number  of 

representative  examples  of  time  and  spectral 

relations  between  speech,  noise  and  their  sum. 
Based  upon  a preliminary  analysis  of  the  office  and 
helicopter  environments,  the  primary  noise  component 
present  is  broadband  white  Gaussian  noise. 

Therefore,  it  was  decided  to  use  this  type  of  signal 
contamination  to  determine  how  LPC  analysis 

degrades.  In  the  following  examples  digitized 

Gaussian  noise  was  scaled  and  added  to  clean  speech 

to  give  an  average  s ignal -to -noise  ratio  of  OdB. 

2 . Spectral  Distortion  due  to  Additive  Noise 

a.  Major  differences  between  the  all-pole  spectre, 
of  clean  and  noisy  speech  include: 


(1)  Loss  of  low  energy  formant  information. 

(2)  Shifted  formant  frequencies 

(3)  Widened  formant  bandwldths 

( 4 ) Overall  decrease  of  spectral  dynamic  range 

b.  The  following  figures  demonstrate  these 

effects  clearly. 

(1)  Figure  III.l  presents  a dual  plot  of  clean 
speech  s(k)  and  the  noisy  speech  x(k). 

(2)  Figure  III. 2 presents  a dual  plot  of  (in 

the  upper  trace)  the  corresponding 

spectrum  and  all-pole  spectrum  of  s(k), 
and  (in  the  lower  trace)  the  spectrum  and 
all-pole  spectrum  of  x(k), 

(3)  Figure  III. 3 presents  multiple  plots  in  the 
top  trace  of  the  spectrum  of  the  noisy 
speech  x(k)  its  all-pole  LPC  spectrum, 
XHAT(k)  and  the  all-pole  LPC  spectrum  of 
the  clean  speech,  SHAT(k).  In  the  bottom 
trace  is  the  spectrum  of  the  noise  N(k) 
which  was  added  to  s^k). 

c.  Comments:  Figure  III. 3 clearly  demonstrates  how 
the  all-pole  spectrum  of  x(k)  differs  from 
that  of  s(k).  Since  the  first  formant  has 
maximum  energy  its  approximation  is  not 
noticably-  modified.  However,  the  second 
formant  of  XHAT(k)  is  both  shifted  and  its 
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bandwidth  approximately  tripled  with  respect 
to  the  second  formant  of  SHAT(k).  The  third 
formant  of  XHAT  corresponds  to  a high  energy 
peak  of  N ( k ) occurring  at  about  2200  HZ, 
rather  than  the  actual  peak  at  1750  HZ. 

Finally,  the  spectral  dynamic  range  has 
decreased  from  about  55dB  to  about  20dB. 
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Figure  III  ] 

Speech,  S(k)  and  Speech  plus  Noise,  X ( k ) 


3‘  Modifications  to  the  synthetic  speech  waveform 

a.  The  corresponding  temporal  distortions 

resulting  from  LPC  spectral  analysis  of  noisy 
speech  are : 

(1)  Absence  of  "ringing"  due  to  loss  of  low 
energy  formants. 

(2)  Increase  in  "buzzy"  quality  due  to 
spectral  flattening. 

(3)  Increase  in  pitch  and  voicing  errors. 

b.  The  following  figures  demonstrate  these 

effects: 

(1)  Figure  III. 4 (a)  presents  the  synthetic 
waveform  and  its  all-pole  spectrum  using 
clean  speech  s(k). 

(2)  Figure  III. 4 (b)  presents  the  synthetic 
waveform  and  its  all-pole  spectrum  using 
noisy  speech  (OdB  SNR)  . 

(3)  Note:  these  examples  are  taken  from 

another  time  window  of  the  same  speaker. 

c.  Comments:  The  severly  overdamped  character  of 
the  noisy  speech  synthesis  clearly 
demonstrates  why  it  will  sound  more  buzzy  and 

be  le3s  intelligible  than  the  noise-free 
synt  hesls . 
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Autocorrelation  of  Clean  Speech  RSS( 
Crosscorrelation  between  Speech  and  Nois 
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Figure  1 1 1 . 6 

Autocorrelation  of  Noisy  Speech  R; 
Noise  RNN(k)  and  Crosscorrelation  R! 


i4.  Short  time  correlations  between  speech,  noise  and 
noisy  speech . 

a.  Using  data  corresponding  to  an  analysis 
windov length  of  19.2  ms,  the  autocorrelations 
and  c ■'osscorrelat  ions  between  speech,  noise 
and  noisy  speech  were  computed  and  displayed. 

b.  Figure  III. 5 presents  a dual  plot  of  a 

representative  example  of  the  short  time 
autocorrelation  of  the  clean  speech  R (k)  and 
the  short  time  crosscorrelation  of  the  clean 
speech  and  noise  R (k).  (Data  used  is  shown 
in  figure  1 1 I , 1 ) 

( 1 ) Where 

128 

Rss00  = l S(m)s(m+k)  k » 0,  1,  10 

m=l 

128 

Rsn(k)  = l S(m)n(m+k)  k = 0,  1,  ....  lo 

m=l 

c.  Figure  III. 6 presents  a triple  plot  for  the 
same  data  of  the  autocorrelation  of  the  noisy 
speech  RXX(k),  the  autocorrelation  of  the 
noisf  RNN(k)  and  the  crosscorrelation  between 
the  noisy  speech  and  the  noise  RNX(k). 

d.  Comments:  From  these  figures  it  is  clearly 
evident  that  for  the  short  averaging  intervals 
imposed  on  by  the  brief  stationarity  of  the 
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speech  signal,  that  the  cross-terms  are  not 
given  a chance  to  average  to  zero,  and  thus 
they  cannot  be  ignored. 

LPC  Parameter  Variations  Due  To  Additive  Noise 

Object ives 

1.  Compare  LPC  analysis  parameter  variations  versus 
Signal  Contamination 

Approach 

1.  Using  the  laboratory  noise  data  base  described  in 
Section  II  noisy  speech  files  with  specified 
average  s i gna 1 - t o -no i se  ratios  were  created. 

2.  Using  these  calibrated  noisy  speech  file  time 
histories  of  the  LPC  analysis  parameters  were 
generated  and  saved. 

3.  A general  purpose  parameter  comparison  program  was 
written  to  examine,  display,  and  summarize  parameter 
variations  versus  s i gn a 1 - 1 o - noi s e levels. 

4.  Initially  the  parameters  were  computed  for  the 
non-distorted  speech  for  each  analysis  frame  and 
stored  as  a reference  parameter  file.  The  parameter 
computation  was  then  repeated  on  the  degraded 
speech.  The  resulting  parameter  files  were  then 
compared  on  a frame  by  frame  basis. 

5.  As  noise  suppression  algorithms  are  developed,  their 
ability  to  improve  vocoder  performance  will  be 
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empirically  measured  by  comparing  the  analysis 
parameters  from  the  noise  cancelled  process  with 
those  generated  from  clean  speech. 

Tasks 

1.  Development  of  a LPC  analysis  parameter  generator 
program  . 

a.  Program  name:  ANALYS.SAV 

b.  Program  author:  R.  Frost 

c.  Program  Description:  ANALYS.SAV  is  an 

adaptation  of  S.  Boll's  vocoder  program.  It 
writes  out  on  disk  the  energy,  error  energy, 
pitch,  voicing  decision,  LPC  predictor 
coefficients,  reflection  coefficients,  and 

speech  autocorrelations  at  each  analysis 
frame.  The  number  of  poles  is  variable,  and 
is  indicated  by  the  variable  k in  the  first 
data  instruction. 

2.  Development  of  a parameter  comparison  program. 

a.  Program  name:  NUCMPR.SAV 

b.  Program  author:  R.  Frost 

c.  Program  Description:  NUCMPR.SAV  (FOR  NEW 

COMPARE)  is  the  comparison  program.  It  reads 

two  parameter  files  created  by  ANALYS.SAV, 
which  maybe  up  to  20000  points  long.  For  a 12 
pole  LPC  analysis,  this  corresponds  to  about 
64000  speech  data  points.  Five  basic  options 
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are  available:  (1)  listing  of  the  parameters 
on  a frame  by  frame  basis  for  the  two  files, 
or  (2)  viewing  an  overall  comparison  of  the 
pitch  characteristics  of  the  two  files, 
including  plots  of  the  pitch  histories,  the 
number  of  cross  pitch  errors  (>10ms),  voiced 
to  unvoiced  errors,  unvoiced  to  voiced  errors, 
and  the  mean  and  standard  deviation  of  the 
fine  pitch  errors,  (3)  plots  of  the  energy  in 
each  speech  file,  (4)  plots  of  the  error 
energy  in  each  file,  and  (5)  plot3  of  both  the 
maximum  and  minimum  distances  between  the 
files,  as  described  by  Gray  and  Markel, 
"Distance  Measure  for  Speech  Processing", 
IEEE-ASSP,  Vo  1 24,  No.  5,  pp.  380-39  1. 
Their  approach  is  to  define  a metric  based  on 
the  rms  log  spectral  distance.  This  distance 
can  then  be  computationally  estimated  in  an 
efficient  manner  by  computation  of  an  upper 
and  lower  bound,  the  "cosh"  and  "cepstral" 
approximations,  respectively. 

These  separate  functions  are  obtained  by 
typing  the  appropriate  command  after  the 
command  generator  herald,  which  is  &&  . 

3.  Results  using  program  NUCMPR: 

a.  Our  experience  with  this  approach  is  that 


4 1 


these 


measures 


are 


consistent,  and  are 


evaluated  reasonably  quickly.  As  an  example, 
an  utterance  was  corrupted  by  adding  various 
amounts  of  noise.  The  distances  between  the 
various  versions  were  computed,  and  are 
tabulateu  below.  In  each  case  the  distance  is 
measured  from  the  utterance  having  a SNR  of 
40dB  . 


SNR 

Cosh 

Distance  Measure 

Cepstral 

Distance  Measure 

40dB 

mean  = OdB  o = OdB 

mean  = OdB  o = OdB 

30dB 

mean  = 2. 50dB  a = 1 . 14dB 

mean  = 2.37dB  a = 1 .07dB 

20dB 

mean  = 5.35dB  a - 2.04dB 

mean  = 4. 50dB  o = 1 . 69dB 

lOdB 

mean  = 7.74dB  a = 2 . 61 dB 

mean  = 6.15dB  a = 2.19dB 

OdB 

mean  = 9.42dB  o = 3.33dB 

mean  = 7.33dB  o = 2.88dB 

-1  OdB 

mean  =10. 70dB  a = 4.39dB 

mean  = 8. 14dB  o = 3. 64dB 

In  general,  our  experience  is  consistent 
with  the  conjecture  of  Gray  and  Markel  that 
distances  of  less  than  about  2 d B are  difficult 
to  perceive,  while  greater  distances  are  quite 
noticeable,  and  become  increasingly  offensive, 
b.  As  noted  above,  the  time  histories  for  various 
vocoder  parameters  can  be  saved  and  plotted 
for  comparison.  As  an  example  of  the  outputs 
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available,  time  histories  of  the  pitch  period, 
energy,  error  energy  and  Gray  and  Markel 
spectral  distances  were  computed  for  the 
clean,  noisy  and  processed  speech  used  in  the 
dual  input  adaptive  noise  cancelling 
experiment  described  in  Section  V . The  data 
presented  represents  analysis  parameters  from 
248  analysis  frames. 


For 

each  figure  , 

part 

(a) 

c ompa  r e s 

the 

par ame  ter 

histories 

of 

clean 

speech  versus 

processed 

speech  and 

part 

(b) 

compares 

the 

parameter  histories  of  the  noisy  speech  versus 


the 

processed 

speech 

. The  time 

histories 

include: 

( 1 ) 

Figure 

III. 7 

Pitch 

and  Voicing 

(unvoiced 

equals 

zero) 

(2) 

Figure 

III. 8 

Signal 

energy 

(3) 

Figure 

III. 9 

LPC  Prediction  Error 

Energy 

(4) 

Figure 

III.  10 

Maximum  and  Minimum 

Gray  and 

Spectral  Distances 
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Figure  III. 10 

rime  Histories  of  Gray  and  Markel  Spectral  Distances 


SECTION  IV 


An  Integrated  Noise  Suppression-Speech 
Analysis  Algorithm 


Predictive  Noise  Cancellatio: 


A . 


Preface 


This  section  describes  a preliminary  investigation  of  a 
method  for  noise  suppression  where  the  analysis 
autocorrelations  are  modified  to  explicitly  account  for 
additive  noise  present  on  the  speech  waveform.  The  method, 
Predictive  Noise  Cancellation,  gets  its  name  from  the  fact 
that  an  estimate  of  the  current  noise  is  adaptively 
predicted  from  long  term  noise  statistics.  A description  of 
the  method  is  provided  in  the  accompaning  paper,  "Improving 
Linear  Prediction  Analysis  of  Noisy  Speech  by  Predictive 
Noise  Cancellation",  presented  at  the  1 77  International 
Conference  on  Acoustics,  Speech  and  Signal  Processing, 
Hartford,  Connecticut.  The  objectives,  approach,  tasks,  and 
accompanying  theory  are  given  in  the  paper.  Results  and 
future  research  efforts  implied  from  this  study  are  listed 
below. 


B.  Results 

1.  Advantages  of  Predict  i/e  Noise  Cancellation. 

a.  Tne  method  uses  procedures  which  are  currently 
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available  in  real  time  LPC  Vocoders. 


(1)  Autocorrelations  and  convolutions 
( 2 ) Levinson  recursions 
(3)  All-pole  synthesis 

b.  The  method  results  in  a stable  synthesis. 

(1)  The  autocorrelations  can  be  guaranteed  to 
be  positive  definite. 

c.  Background  noise  energy  is  reduced  from  10dB 
to  20dB  depending  upon  the  noise  environment. 

Disadvantages  of  Predictive  Noise  Cancellation. 

a.  The  method  is  dependent  upon  the  phase  of  the 
signals  processed  since  crosscorrelations  are 
used  . 

(1)  This  required  converting  all  signals  to 
minimum  phase  realizations. 

b.  The  noise-signal  correlation  filter  estimate 
(Wiener  Filter)  was  corrupted  when  speech  was 
present  . 

(1)  The  estimator  requires  the 

crosscorrelation  R-  (k)  between  the 

current  noise  n(k)  and  the  average  noise 
n(k).  This  correlation  term  is  given  by  : 

Rfin<k)  = RSx(k)  ' “Ss<k) 

(2)  When  R - s ( k ) is  non-zero  (speech  present), 


Rpx ( k ) becomes  a poor  estimate  of  R "n ( k ) . 

C . Future  Research 

Ft  a .?  e u upon  the  inadequacies  uf  the  Predictive  Noise 
Cancellation  model,  a frequency  domain  spectral  averao;ini? 
technique  is  currently  being  eveloped.  This  method  retains 
the  advantages  of  PNC  (ie  LPC  algorithms,  stable  synthesis, 
and  about  IbdB  noise  suppression)  but  avoids  the 

disadvantages.  Results  will  be  presented  in  the  next 
Semi-snnual  Technical  Report  . 
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IMPROVING  LINliAR  Pl’l DICTION  ANALYSIS  OF 
NOISY'  SPLF  OI  BY  PRJ  niCIlVL  NO! SB  CANCliLlArtON 

Steven  F.  Boll 


Computer  Science  Department 
University  of  Utah 
Salt  Lake  City,  Utah  84112 


Abstract 

The  analyst  of  speech  using  Linear  Predic- 
tion is  reformulated  to  account  for  the  presence  of 
acoustically  added  noise  and  a teclmiquc  is  pre- 
sented for  reducing  its  effect  on  parameter  estima- 
tion. The  method,  called  Predictive  Noise 
Cancellation  (PNC),  modifies  the  noisy  speech 
autocorrelations  using  an  estimate  of  present 
background  noise  which  is  adaptively  updated  from 
an  average  all  pole  noise  spectrum.  The  all-pole 
noise  spectrum  is  calculated  by  averaging  auto- 
correlations during  non-speech  activity.  The  meth- 
od uses  procedures  winch  are  already  available  to 
the  LPC  analyzer , and  thus  is  well  suited  for  real 
time  analysis  of  noisy  speech.  Preliminary  re- 
sults sliow  signal  to  noise  improvements  on  the 
order  of  10  to  20  db. 


lntroduct ion 

As  noise  is  acoustically  added  to  speech,  the 
resulting  intelligibility  .md  quality  of  the  LPC 
synthesis  degrades  [1],  [l],  |ius  paper  presents  a 
technique  which  accounts  for  the  noise  present  and 
modifies  the  noisy  speech  autocm relations  in 
order  to  suppiess  it.  The  method  is  based  upon 
fhc  simple  observation  t ii.it  if  x(k)=s(k)*n(k) , 
where  s(h)  is  clean  speech,  :i(k)  is  the  added 
noise,  and  x C h ) their  sum,  and  if  the  noise  signal 
n(k)  were  known  exactly,  then  the  desired  speech 
autocorrelations,  l\s(in)  can  he  recovered  from  the 

noisy  speech,  ,x(k)  by  computing: 

Rss«  * RxxW  - Rxn(m)  ‘ Rnx«  * lWm)  {1) 

where 

R (m)  = l )*»f  = R (-m) 

AJ  t ^ I LA 

Rrv(m)  * I x(k)xfk*m) 
xx  k 

RSSM  • £ s(k)s(k*m) 

Of  course  the  noise  is  not  known  within  any 
given  analysis  frame  and  must  be  approximated.  A 


method  for  estimating  it  is  the  subject  of  this 
paper.  Once  an  estimate  for  the  local  noi-c  com 
ponent  is  determined,  equation  (1)  an  he  used  to 
calculate  the  autocorrelations  of  the  estimated 
speech  spectrum  from  which  the  Ll’C  parameters  can 
be  obtained. 

Constraints 

Since  the  noise  cancellation  is  to  lie  inte- 
grated into  the  LPC  analysis,  it  was  decided  that 
the  estimation  of  the  present  noise  i om|  nent  be 
done  using  algorithms  already  available  to  the  l.i'C 
analyzer.  In  addition,  noise  cliaract  erizat  ion  and 
estimation  should  depend  only  upon  the  u'tual 
background  environment  as  recorded  by  the  micro- 
phone . 

Plan 

To  satisfy  these  constraints  the  noise  envi- 
ronment is  modeled  hv  an  all-pole  spec*  iilh.  It  i 
estimated  by  averaging  autocorrcl.it  ion  during  an 
initial  period  of  non-speech  actnitv,  fiic.-e 
averaged  noise  autocorrelations  are  then  used  to 
estimate  the  present  frame  noise  component . I he 
local  noise  component  is  estimated  hv  coinoliiiig 
the  average  noise  autocorrelations  with  a coiiclu- 
tior  filter  whose  impulse  response  i'-  estimated 
for  each  fnimo  to  minimize  the  mean  squat  e etror 
between  the  average  noise  and  the  >■  1 signal. 

Thus  the  method  can  lie  described  as  chat  of  adap 
tivcly  filtering  past  noise  to  approximate  piesent 
noise. 

Method 

There  arc  four  phases  to  the  process  of 
Predictive  Noise  Cancellation.  They  are:  (1) 

estimation  of  average  background  noise  u.  ing  l.i'C; 

(2)  estimation  of  noise  signal  cor  rel.it  ion  t liter 

(3)  modification  of  noisy  speech  autocoriel.it  ions, 
and  (4)  calculation  of  final  Ll’C  parameters. 

Background  Noise  Fstiinat ion 

During  the  startup  or  a calibration  period 
when  just  background  noise  is  recorded  by  the 
microphone,  the  first  M+l  autocorrelations  repro 
sent ing  just  noise  are  computed  and  averaged 
together.  Define: 


5 1 


N 


R ,(<")  ‘ i l CV)  m*0'1 N 

C 1*1 


where 


The*  purpose  of  ||(:)  is  to  modify  f.  1 ' to 
approximate  the  noise  nfk)  within  the  < incut 
analysis  frame.  The  filter  is  estimated  using  a 
least  square  criterion.  The  tap  paiamctcrs  of 
ll(z)  are  estimated  in  order  to  minimize 


m N’1 

Ri/O")  * l x(k)x(k*m), 

XX  k*J 

is  the  nth  autocorrelation  during  the  1 1 h frame  to 
be  averaged. 

x(k)  * noise  signal  (s(k)*0) 

* number  of  fr.imcs  to  lie  averaged 
(normally  1/2  sec) 

M * order  of  all  - jxole  noise  spectrum  (set 

to  10) 

At  the  completion  of  the  calibration  period, 
predictor  coefficients  representing  t he  noise 
an(k)  are  computed  using  Levinson's  recursion. 

Finallv  since  it  will  be  necessary  to  compute 
crosscorrclation  between  the  average  noise  nfk)  and 
the  noisy  signal,  .xfk),  the  first  N values  of  the 
minimum.  pliase  i impulse  response  nfk)  defined  from 
a (k)  arc  computed  as: 

M 

nfk)  * - I an(i)n(k-i)*G.flkt0  (3) 


k ■ 0,1 N-l . 

where 


K-  (0)*  F a (i)R-.(i) 
nji1  ‘ .-j  u ' ' nn'  ' 


N * analysis  window  length 
(Nominally  20  ms) 


Noise -Signal  Con  elation  I liter 


A block  diagram  indicating  t he  noise  cancella- 
tion procedure  is  shown  in  figure  (1). 


♦©— 


«(T) * 


s:  speech 

n:  noise 

h:  averaged  noise 

L _. 

li)  ‘ F h f i ) z correlation  f.lter 

1*0 

x:  noisy  speech 

u:  filtejej  average  noise 

noise  cancelled  speech 

Predictive  Noise  Cancellation  Pdock  Diagram 
Figure  1 


l [xfk)  f h(i)hfk-i)]2 
k i*0 


(4) 


Minimizing  Fquation  (4)  with  respect  to  h(i)  re- 
sults in  a toeplit:  system  of  linear  equations: 


L 

[ h(i)«fiJi-j)  * R^(j)  j-0,1 1.  (5) 

i*0  ™ ax 


where 

n- 1 

R^(j}  * l n(k)xfk*j ) 
nx  k*0 

xfk)  * - ^ ax(i)x(k  i)*Rx«kf0 


It  was  necessary  to  use  the  l.PC  min'mum  phase 
approximation  xfk)  to  xfk)  since  nfk)  is  an  l.PC 
minimum  phase  approxim.it ion.  Note  that  IK:  1 can 
be  calculated  using  the  two  pass  Levinson’s 
recursion  [2],  [3], 

After  estimating  I K r. ) it  is  normalized  to 
have  a spectral  average  of  unit'.1  by  Jnidsnt  each 
tap  parameter  lift)  by  h(0).  This  nonmil  teat  ion 
was  included  since  the  purpose  of  Ilf:)  is  to  shape 
the  spectrum  of  nfk)  hut  not  to  increase  its  total 
energy. 

Antocorrel.il  ion  “Mi  f icat  ion 

Referring  to  Figure  2,  the  autocorrei.it  ion  of 
the  noise  cancelled  speech,  s(kj  are  given  bv 

R^(m)  * R.-(r,)-R.ufm).R.uf-m)Ml(iufm)  M 

m * P, 1 , . . ,M 

It  is  not  necessary  to  explicitly  calculate  mil 

in  order  to  obtain  R.  (m)  and  K (m).  Thc-e 
xu  mi 

correlation  terms  can  he  calculated  from  R imi 

Ml 

and  R--(m)  as  follows: 


I. 


Since 

u(k) 

* F h(i)u(k-i)  k=n, | , . . , 

n 

i*0 

then 

N-l 

N-l  L 

R-  (m' 
xuv 

’ l 

x(k)u(k*m)*  l x (k)  F h( i )h(k*m  i 

) IS) 

k*0 

k*0  i =0 

or 

L N-l 

Ro..^)  * [ h(i)  l x(k)n(k*m-  i ) (9) 

i*0  k*n 

In  terns  of  R-.fm)  wc  have 
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L 

R.  (m)  * l h(i)R.  (m-i)  (10) 

xu  i«0  ,xn 

Likewise  R ,(m)  can  he  obtained  from  R--(m) 
uu  nn 

and  h(i)  as  follow*: 

* h(m)*h(-m)«R^(m)  (11) 


let 

Vm1 


L 

h(m)»h(-m)  » l h(i)h(i*m) 
i*0 


then 

lWm1 


L 

. I Rhh(il’WIB'11  ro*0*1 M 


LPC  Parameter  Calculation 


(12) 


(13) 


Having  calculated  R.  (m)  and  R (m),  the 
xu  uu 

autocorrelations  R^(m)  of  the  noise  cancelled 

speech  can  be  computed  using  Equation  (6).  From 
these  the  LPC  coefficients  can  he  calculated  using 
the  Levinson’s  recursion.  A stahlc  filter  will 
result  since  RWm)  is  positive  definite. 


Implementation  and  Results 

The  algoritlun  was  inserted  into  an  LPC 
vocoder  simulation  and  tested  on  a data  base  con- 
sisting of  three  types  of  noisy  speech.  Type  one 
was  clean  speech  plus  known  amounts  of  gu.assi.an 
noise  digitized  from  an  analog  noise  generator. 

Type  two  was  clean  speech  plus  known  .amounts  of 
noise  recorded  in  a helicopter  cockpit.  Type  three 
was  speech  recorded  in  a helicopter.  Specifica- 
tions for  the  vocoder  simulation  were  as  follows: 

Sampling  frequency  * 6.667  kl Is 
Analysis  Window  Length,  N 1 19.2  ms 
Predictor  Order,  M * in 
Correlation  niter  Order,  1.  = in 
Initial  Averaging  Period,  N « O.S  sec. 

Results 


adaptively  updated  linear  filter.  The  noisy 
speech  autocorrelations  arc  then  modified  to 
account  for  the  noise  estimate.  The  algorithm  is 
currently  being  tested  on  a variety  of  noisy 
operating  environments  with  preliminary  results 
showing  a signal  to  noise  improvement  of  lu  to  20 
db. 
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An  audio  tape  demonstrating  the  results  will 
be  played.  A coarse  measure  of  signal  to  noise 
improvement  can  be  calculated  by  comparing  the 
energy  before  cancellation  Rxx f n)  with  the  energy- 

after  cancellation  Rjj.(0).  An  improvement  on  the 

order  of  10  to  20  dh  was  observed  for  all  types  of 
noisy  speech.  Methods  for  measuring  improvements 
in  quality  and  intelligibility  arc  currently  being 
invest igated. 


Conclusion 


An  integrated  system  for  noise  cancellation 
coupled  with  Ll’C  analysis  has  been  presented.  The 
method  assumes  that  noise  present  during  the 
current  analysis  frame  can  he  estimated  hv  filter- 
ing an  all  pole  average  noise  spectrum  through  an 
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SECTION  V 


A PREPROCESSING  NOISE  CANCELLATION  ALGORITHM: 

DUAL  INPUT  NOISE  SUPPRESSION 

DENNIS  PULSIPHER 
Introduction 

The  presence  of  noise  in  speech  signals  has  long  been 
annoying.  Many  techniques  for  reducing  various  types  of 
noise  have  been  described  and  implemented.  These  techniques 
have  fali.en  mainly  into  two  categories:  direct  linear 

filtering,  and  model  fitting.  Though  many  methods  of 
deriving  the  filter  to  be  used  have  been  proposed,  the 
filtering  techniques  have  one  thing  in  common;  their  attempt 
to  improve  the  signal-to-noise  ratio  (SNR)  is  accomplished 
by  attenuating  those  frequencies  with  poor  SNR  and  giving 
emphasis  to  those  with  higher  SNR,  subject  to  certain  other 
constraints  . 

Model  fitting  has  been  used  to  reduce  noise  by 
estimating  a set  of  parameters  which  are  then  used  to 
synthesize  a signal  estimate.  Among  the  best  examples  of 
noise  reduction  by  model  fitting  are  vocoders,  particularly 
the  interactive  homomorphic  vocoder  implemented  by  Neil 
Miller  in  1973  [ 1 ] . 
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New  impetus  has  been  given  to  research  in  the  area  of 
noise  reduction  by  recent  implementation  of  low-bandwidth 
digital  transmission  schemes,  such  as  linear-predictive 
vocoders  (LPC).  In  noisy  environments  such  vocoders  perform 
poorly. 

Efforts  to  develop  digital  algorithms  to  minimize  the 
noise-induced  problems  created  by  such  environments  as 
helicopters,  airplanes,  ships,  and  even  offices  have  been 
intensified.  Additional  encouragement  has  been  derived  from 
the  availability  of  digital  processors  capable  of  performing 
complex  algorithms  at  real-time  speeds. 

It  is  in  this  setting  we  present  a technique  for  using 
information  obtained  by  making  measurements  of  both  a noisy 
signal  and  a signal  containing  only  related  noise  to 
estimate  a noise-free  signal.  We  then  present  an  algorithm 
for  implementing  the  technique.  A brief  report  on 
experiments  performed  to  evaluate  the  technique,  including 
data  base  generation  and  observations  about  the  results  then 
precedes  the  conclusion  of  the  report.  The  observations  and 
conclusions  represent  working  ideas  and  should  not  be 
considered  final  at  this  time. 
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NOISE  CANCELLATION 


In  December  of  1975  Bernard  Widrow  proposed  the  use  of 
Adaptive  Noise  Cancellation  for  the  removal  of  noise  from 
pilot-to-ground  communication  [2],  He  also  reported  the 
results  of  an  experiment  using  very  stylized  noise. 
Adaptive  noise  cancellation  was  not  new,  many  applications 
had  been  found  where  it  worked  well.  Among  these  were 
antenna  side-lobe  cancellation  [3],  data  channel 
equalization  [4],  telephone  channel  echo  cancellation  [5] 
[6]  [7],  and  noise  reduction  in  electrocardiography  [5], 
Prior  to  this,  however,  no  attempt  that  we  are  aware  of  was 
made  to  apply  this  technique  to  noise  suppression  in  speech 
signals. 

This  noise  cancellation  technique  differs  significantly 
from  classical  techniques.  Noise  reduction  is  attempted  by 
estimating  the  noise,  then  subtracting  it  from  the  noisy 
signal.  Without  using  extreme  care,  this  could  result  in  an 
increase  in  noise  power,  so  we  should  examine  the  mechanism 
by  which  a reduction  is  achieved. 

If  we  are  given  the  sum  x of  two  mutually  uncorrelated 
signals  s,  and  N and  a third  signal  V which  is  mutually 
uncorrelated  with  s,  let  us  form  a signal  estimate  s = x - u 
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E[s  ] is  unaffected  by  changing  H.  That  is,  if  we  minimize 
the  energy  in  s,  we  have  minimized  the  mean  squared  value  of 
vN  - u),  and  consequently,  the  mean  square  value  of  ( s - s), 
since  by  1.)  5 - s = N - u.  Thus  s is  a mean  squared 

estimate  of  s.  The  constraint,  of  course,  is  that  our 

-2 

minimization  of  E[S  ] be  accomplished  with  a u that  is  a 
linearly  filtered  version  of  V. 

It  remains  for  us  to  describe  a model  which  could 
benefit  from  this  analysis,  and  to  present  an  algorithm  for 
its  implementation.  The  model  we  choose  to  assume  for 
initial  experimentation  is  one  with  a single  signal  source 
and  a single  noise  source. 

The  noise  V is  recorded  by  a microphone  placed  so  that 
the  signal  s does  not  reach  it.  The  noise  is  also 
transmitted  through  a channel  G and  is  recorded  (at  a second 
microphone)  along  with  the  signal  s as  the  noisy  signal  x. 
(Figure  V.2) 
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Figurt  V.2 

Data  generation  model 


If  G can  be  approximated  as  a finite  impulse  response 
filter,  clearly  miking  H in  Figure  V,  . 1 equal  to  G will 
result  in  a u Jqual  to  the  noise  in  x.  3y  subtracting  u 
from  x we  are  Left  with  s.  In  general,  finding  an  optimal  H 
is  a problem  equivalent  to  plant  identification. 

This  analogy  renews  our  hope  that  the  technique  may 
work  well  in  extremely  noisy  environments,  since  H can  be 
estimated  best  when  x ano  V are  closest  to  being  linear 
filtered  versions  of  each  other.  This  happens  when  s is 
smallest  relative  to  x.  That  is,  we  have  the  best  chance  of 
eliminating  the  noise  in  x when  it  is  the  noisiest  - exactly 
when  we  need  it  the  most  for  our  application. 

The  model  requires  that  we  find  an  algorithm  to  best 

estimate  H o that  our  mean  squared  estimate  of  s can  be 
obtained . 


THE  ALGORITHM 


The  heart  of  the  adaptive  algorithm  is  the  adaptive 
filter  (channel)  H,.  Through  it  pass  the  noise  vectors  V j 

J V 

resulting  in  the  noise  estimate  u . That  is, 

J 


vth. 

J J 


where  Vj,  Hj  are  I,  element  vectors  and  J denotes  the  time  at 
which  they  occ  jr  . 
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X . , 

J 


The  estimate  s,  is  calculated  by  subtracting 
J 

SJ  = Xj  ' Uj  ■ Xj  - V>j  - 'j  - «Jvj 

squaring  yields 

■ (xj  • uj>2  ■ xj  • 2*jvJHj + hJvIhj 

and  taking  the  expected  value  gives 


u . from 

J 


E[i*]  ■ £[«*]  - 2E[xjv]Hj]  * EiMjVjVjHj]. 


Assuming  a stationary  filter  H gives 


E[sj]  = E[xf]  - 2E[x.vj]H  + HTE[V.vT]H. 

u J J J J 


Defining 


p = E[x . vT] 

J Jj 


f ci*8PS"s  correlati^^  between  the 


noisy  signal  xj  (a  scalar)  ana 
the  noise  reference  (a 
vector ) 


and 


R ■ EEVjVjj 


[Reference  input  correlation 
mat  ri y ] 
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we  have 


E[ij]  = ECxjl  - 2PTh  + hTRH 


which  is  a quadratic  function  of  H,  hence  has  a unique 
minimum  H* . By  differentiating  with  respect  to  the 
elements  of  H we  get 

V = -2P  + 2 R H . 

Setting  V = 0 to  find  the  minimum  yields 

H*=  R‘ ]p. 


Since  we  are  attempting  to  implement  the  process  in 
real  time  we  might  use  a steepest  descent  algorithm 


j + 1 


where  the  parameter  y controls  convergence  and  stability. 
Unfortunately  we  d » . not  have  access  to  y.  , so  we  must  be 

J 

satisfied  with  a gradient  estimate  y.  . Widrow  [2]  has 

J 

suggested  the  use  of 


y . - -2s. V. 

J J J 


which  yields  the  algorithm: 

Vi 1 hj  + 
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which  is  simple  to  implement. 


Figure  V.3 
Adaptive  Model 


convergence  of  this  estimate  has  been  shown  for  Vj 

uncor related  with  V ^ for  k=J  [9],  provided  y is  chosen  small 

enough.  Under  the  assumption  of  ergodicity,  convergence  has 

also  beer.  shown  for  special  cases  of  correlated  V.  [10] 

J 

[11j,  Researchers  at  Bell  Laboratories  have  found  the 

algorithm  to  be  so  robust  that  for  echo  suppression  and 

channel  equalization 

7.  = -2SGNC5 ,]V  . 

J J where  SGN[*] 

A 

= algebra  ic  sign  of  [ • ] 
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has  been  sufficiently  accurate  to  produce  convergence  [ s ] . 
Others  have  proposed  similar  algorithms  and  proposed 
constant  and  time-varying  p's.  [12]  [13]  [14]  [15].  The 
algorithms  asymptotic  behavior,  residual  error,  and 
nonstationary  behavior  in  special  cases  have  also  been 
investigated  elsewhere  [2]  [4]  [13]  [14]  [15]  [16]. 


THE  EXPERIMENTS 

In  order  to  evaluate  the  potential  of  the  adaptive 
noise  cancellation  algorithm  as  a technique  for  the 
suppression  of  noise  in  speech  signals,  several  experiments 
were  performed.  Initially  a data  base  of  different  types  of 
noise  was  generated.  Each  of  these  noise  sources  was  then 
passed  through  -arious  known  channels  End  the  results  used 
to  augment  the  data  base.  This  processed  noise  was  then 
scaled  and  added  to  a speech  signal.  The  resulting  noisy 
signal  and  the  original  reference  noise  were  then  applied  as 
inputs  to  the  noise  cancelling  algorithm  and  the  resulting 
filter  estimates  compared  with  the  known  channel  responses. 

Similar  experiments  were  periormed  on  data  collected  in  real 

situations  . 

Providing  a aeries  of  noise  sources  with  varying 
characteristics,  while  keeping  the  base  to  a manageable  size 
suitable  for  storage  in  a limit’d  space  was  among  the  first 
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tasks  to  be  undertaken.  To  be  able  to  have  unpredictable, 
yet  stylized  noise,  we  decided  to  digitize  analog  noise  of 
three  type3.  A Gaussian  Noise  Generator,  a constant  square 
wave,  and  a hand  swept  square  wave  were  each  low  pass 
filtered  to  3.2  KHZ  and  sampled  at  6.67  KHZ.  Approximately 
12.3  sec.  of  each  source  was  recorded. 

Three  known  channels  were  then  used  to  process  the 
previously  digitized  noise.  A channel  having  a low  pass 
cutoff  at  1,5  KHZ,  a channel  with  three  narrow  passbands  at 
500  HZ,  1500  HZ,  and  2500  HZ,  and  a channel  with  a simple 
delay,  were  selected  to  represent  a variety  of  possible 
channels.  (Figure  V.4), 

Results  ^ f this  processing  were  then  measured  by  a 
spectral  weighting  algorithm  which  attempted  to  measure  the 
energy  in  the  signal,  as  perceived  by  a listener.  Similar 
measurement  was  made  of  a speech  signal,  thus  providing 
information  making  it  possible  to  combine  the  two  signals  to 
yield  a signal  with  a known  SNR.  The  appropriate  scaling 
was  performed  to  provide  signals  with  SNR  of  0,  20  and  *10 
dB. 

Data  samples  were  also  recorded  in  a quiet  room,  an 
office,  and  in  a Bell  Jet  Ranger  helicopter.  Of  course,  in 
these  cases  the  channel  and  SNR  were  not  known,  since  two 
microphones  were  used  to  record  the  reference  noise  and  the 


noisy  signal  simultaneously. 
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RESULTS 


The  experiments  performed  yielded  encouraging,  but  not 
decisive  results.  In  the  artificially  contrived  experiments 
It  was  found  that  each  of  the  channels  was  estimated  very 
closely  for  the  cases  where  the  reference  noise  was 
Gaussian,  and  had  bands  where  the  estimation  was  good  and 
bands  where  It  was  bad  for  the  other  cases.  This  does  not 
Imply  that  significant  noise  reduction  was  not  achieved,  it 
simply  Implies  that  estimation  of  the  channel  was  les3 
accurate  for  these  cases  than  for  the  Gaussian  case.  It  Is 
believed  that  this  Is  caused  because  of  the  violation  of  two 
assumptions  required  for  convergence  to  the  optimal  channel. 
The  first  of  these  Is  that  successive  noise  vectors  V.  are 

J 

uncorrelated;  for  nearly  periodic  noise,  this  is  clearly 
not  true.  The  second  is  that  an  Infinite  sample  size  Is 

available  - this  Is  also  not  true,  though  It  appears  to  be 

the  less  important  of  the  two  problems. 

Additional  observations  of  Interest  Include  the  fact 

that  updating  H . on  a point-by-point  basis  was  much  less 

J 

irritating  than  updating  less  frequently.  Since  this 

practice  forces  successive  V.  to  be  highly  correlated,  It 

J 

may  be  Interesting  to  note  that  correlation  of  successive  Vj 
Is  not  sufficient  to  prevent  convergence  of  the  channel 
estimate.  The  type  of  correlation  Is  also  very  important. 
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It  Is  also  of  significant  Interest  to  point  out  that  a 
relatively  fast  adaptation  rate  seems  to  be  preferred  over  a 
slow  adaptation  rate  with  a smaller  residual  error.  That 
is,  the  open,  echo-like  quality  induced  by  rapid  adaptation 
is  more  pleasing  than  the  loss  of  intelligibility  in  the 
first  few  words  caused  by  slow  adaptation.  This  is  partly 
due  to  the  slow  disappearance  of  this  echo  with  decreasing 
adaptation  rate.  Significantly,  speech  signals  thus 
processed  seem  to  work  very  well  with  LPC. 

Other  interesting  results  are  that  the  signal  estimate 
for  the  OdB  SNR  signals  are  very  similar  to  those  for  the 
20dB  SNR  signals  and  the  40dB  SNR  signals.  The  change  in 
the  40dB  SNR  signals  should  probably  be  classified  as  a 
slight  degradation,  while  both  other  cases  should  be 
classified  as  major  Improvements. 

Initial  experiments  on  data  recorded  in  actual  settings 
have  been  less  successful.  While  slight  noise  reductions 
have  been  observed,  the  failure  of  the  channel  estimate  to 
converge  to  anything  has  spurred  further  experiments  to 
determine  the  simple  model's  major  deficiencies.  Additional 
mutually  uncorrelated  noise  at  both  inputs  is  an  oovious 
source  of  error,  but  it  is  not  known  at  this  time  how  great 
the  contribution  of  this  source  is.  Other  possible  sources 
of  error  presently  under  investigation  are  multiple  noise 
sources,  non-linear  channels,  and  the  need  for  subsequent 
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processing  of  the  signal  estimate  using 
techniques. 


conventic.  a 1 
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CONCLUSIONS 


Many  experiments  have  been  performed  and  many  are  now 
being  performed  which  suggest  that  the  adaptive  noise 
cancellation  algorithm  holds  great  promise  for  significant 
noise  suppression  in  speech  signals.  While  certain 
deficiencies  have  been  observed  for  periodic  noise, 
significant  noise  reduction  has  also  been  achieved.  Though 
the  signal  estimates  are  sometimes  excellent  (see  Figures 
V.5,  V.6)  a very  slight  degradation  of  the  speech  seems  to 
be  common.  Perhaps  most  Important  is  the  realization  that 
for  real  speech  data  the  adaptive-noise  cancellation 
algorithm  has  demonstrated  a great  potential  for  success  but 
is  not  yet  ready  to  claim  a history  of  success. 
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Figure  V . 5 

Signal  examples  from  region  where  H has  converged 
a)  Original  signal  b)  Noisy  signal  c)  Signal  estimate 
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Figure  V . 6 

Spectra  of  signals  in  Figure  V.5 
a)  Original  signal  b)  Noisy  signal  c)  Signal  estimate 
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